Breaking the Closed-World Assumption in Stylometric Authorship Attribution
نویسندگان
چکیده
Flow of the Classify-Verify algorithm on a test document D and a suspect set A, with optional acceptance threshold t and in-set prob. p. The Classify-Verify Algorithm Input: Document D, suspect author set A = {A1, ...,An}, target measure to maximize μ Optional: in-set prob. p, manual threshold t Output: AD if AD ∈ A, and ⊥ otherwise CA← classifier trained on A VA = {VA1, ...,VAn} ← verifiers trained on A if t ,p not set then t ← threshold maximizing p-μR of ClassifyVerify cross-validation on A else if t not set then t ← threshold maximizing p-μ of ClassifyVerify cross-validation on A end if A← CA(D) if VA(D, t) = True then return A else return ⊥ end if
منابع مشابه
Classify, but Verify: Breaking the Closed-World Assumption in Stylometric Authorship Attribution
Forensic stylometry is a form of authorship attribution that relies on the linguistic information found in a document. While there has been significant work in stylometry, most research focuses on the closed-world problem where the document’s author is in a known suspect set. For open-world problems where the author may not be in the suspect set, traditional methods used in classification are i...
متن کاملAuthorship Attribution Using Text Distortion
Authorship attribution is associated with important applications in forensics and humanities research. A crucial point in this field is to quantify the personal style of writing, ideally in a way that is not affected by changes in topic or genre. In this paper, we present a novel method that enhances authorship attribution effectiveness by introducing a text distortion step before extracting st...
متن کاملDomain Independent Authorship Attribution without Domain Adaptation
Automatic authorship attribution, by its nature, is much more advantageous if it is domain (i.e., topic and/or genre) independent. That is, many real world problems that require authorship attribution may not have in-domain training data readily available. However, most previous work based on machine learning techniques focused only on in-domain text for authorship attribution. In this paper, w...
متن کاملInvestigating Topic Influence in Authorship Attribution
The aim of this paper is to explore text topic influence in authorship attribution. Specifically, we test the widely accepted belief that stylometric variables commonly used in authorship attribution are topic-neutral and can be used in multi-topic corpora. In order to investigate this hypothesis, we created a special corpus, which was controlled for topic and author simultaneously. The corpus ...
متن کاملThe Key Factors and Their Influence in Authorship Attribution
Authorship attribution has a long history started since 19th century. Existing studies have used different sets of stylometric features and computational methodologies on a variety of corpus with different lengths and genres. This study presents a protocol to perform a systematic literature review (SLR) to identify the best combination of stylometric features and computational methodology. Spec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014